Processing apparatus, processing method and compiler

ABSTRACT

Differences in encoding efficiency of instructions may arise if certain operations require very large immediate values as operands, as opposed to others requiring no immediate values or small immediate values. The present invention describes a processing apparatus, a compiler as well as a method for processing data, allowing the use of instructions that require large immediate data, while simultaneously maintaining an efficient encoding and decoding of instructions. The processing apparatus comprises a plurality of issue slots (UC 0,  UC 1,  UC 2,  UC 3 ), wherein each issue slot comprises a plurality of functional units (FU 20,  FU 21,  FU 22 ). The processing apparatus is arranged for processing data, based on control signals generated from a set of instructions being executed in parallel. The processing apparatus further comprises a dedicated issue slot (UC 4 ) arranged for loading an immediate value (IMV 1 ) in dependence upon a dedicated instruction (IMM). The immediate value (IMV 1 ) can be stored in a dedicated register file (RF 2 ) and the issue slot requiring this value can retrieve it from the dedicated register file (RF 2 ).

TECHNICAL FIELD

The invention relates to a processing apparatus conceived for processingdata, based on control signals generated from a set of instructionsbeing executed in parallel, comprising: a plurality of issue slots,wherein each issue slot comprises a plurality of functional units, theplurality of issue slots being controlled by a set of control words,corresponding to the set of instructions.

The invention further relates to a method for processing data, saidmethod comprising the following steps:

storing input data in a register file;

processing data retrieved from the register file based on controlsignals generated from a set of instructions being executed in parallel,using a plurality of issue slots controlled by a set of control wordsbeing generated from the set of instructions;

and wherein each issue slot comprises a plurality of functional units.

The invention further relates to an instruction set, comprising aplurality of instructions for execution by said processing apparatus.

The invention further relates to a computer program comprising computerprogram code means for instructing a computer system to perform thesteps of said method.

The invention further relates to a compiler program product forgenerating a sequence of sets of instructions being arranged forexecution by said processing apparatus.

The invention still further relates to an information carrier comprisinga sequence of sets of instructions being arranged for execution by saidprocessing apparatus.

BACKGROUND ART

Computer architectures consist of a fixed data path, which is controlledby a set of control words. Each control word controls parts of the datapath and these parts may comprise register addresses and operation codesfor arithmetic logic units (ALUs) or other functional units. Each set ofinstructions generates a new set of control words, usually by means ofan instruction decoder which translates the binary format of theinstruction into the corresponding control word, or by means of a microstore, i.e. a memory which contains the control words directly.Typically, a control word represents a RISC like operation, comprisingan operation code, two operand register indices and a result registerindex. The operand register indices and the result register index referto registers in a register file.

In case of a Very Large Instruction Word (VLIW) processor, multipleinstructions are packaged into one long instruction, a so-called VLIWinstruction. A VLIW processor uses multiple, independent functionalunits to execute these multiple instructions in parallel. The processorallows exploiting instruction-level parallelism in programs and thusexecuting more than one instruction at a time. In order for a softwareprogram to run on a VLIW processor, it must be translated into a set ofVLIW instructions. The compiler attempts to minimize the time needed toexecute the program by optimizing parallelism. The compiler combinesinstructions into a VLIW instruction under the constraint that theinstructions assigned to a single VLIW instruction can be executed inparallel and under data dependency constraints. Encoding of instructionscan be done in-two different ways, for a data stationary VLIW processoror for a time stationary VLIW processor, respectively. In case of a datastationary VLIW processor all information related to a given pipeline ofoperations to be performed on a given data item is encoded in a singleVLIW instruction. For time stationary VLIW processors, the informationrelated to a pipeline of operations to be performed on a given data itemis spread over multiple instructions in different VLIW instructions,thereby exposing said pipeline of the processor in the program.

In practical applications, the functional units will be active alltogether only rarely. Therefore, in some VLIW processors, fewerinstructions are provided in each VLIW instruction than would be neededfor all the functional units together. Each instruction is directed to aselected functional unit that has to be active, for example by usingmultiplexers. In this way it is possible to save on instruction memorysize while hardly compromising performance. In this architecture,instructions are directed to different functional units in differentclock cycles. The corresponding control words are issued to a respectiveissue slot of the VLIW issue register. Each issue slot is associatedwith a group of functional units. A particular control word is directedto a specific one among the functional units of the group that isassociated with the particular issue slot.

The encoding of parallel instructions in a VLIW instruction leads to asevere increase of the code size. Large code size leads to an increasein program memory cost both in terms of required memory size and interms of required memory bandwidth. In modem VLIW processors differentmeasures are taken to reduce the code size. One important example is thecompact representation of no operation (NOP) operations in a datastationary VLIW processor, i.e. the NOP operations are encoded by singlebits in a special header attached to the front of the VLIW instruction,resulting in a compressed VLIW instruction.

Instruction bits may still be wasted in each instruction of a VLIWinstruction, because some instructions can be encoded in a more compactway than others can. Differences in encoding efficiency of instructionsarise, for instance, because certain operations require very largeimmediate values as operands, as opposed to others requiring noimmediate values or small immediate values. Instructions requiring verylarge immediate values are commonly used for initialization of registervalues. Especially in processors with a large datapath width, typicallylarger than 16 bits, it can be very expensive to initialize registersusing a single instruction. Encoding the immediate value alone alreadyrequires as many bits as the datapath width, and additional bits forencoding of the operation code and register index are required as well.In case different instructions also have to be encoded for the sameissue slot and these instructions require fewer bits, a very inefficientinstruction encoding is obtained for this particular issue slot. Thisis, for instance, the case in VLIW architectures with a fixed controlword width, since in combination with a varying instruction width thedecoding process becomes less efficient. U.S. Pat. No. 5,745,722describes an apparatus for executing a program which contains immediatedata and a program conversion method for generating an instruction whichthe apparatus can carry out. The program conversion method is used forencoding immediate data at the time of converting a program into adesired program format, thereby reducing the size of an instructioncode. The program conversion method is mainly used by a compiler. Whenexecuting the resulting program, instructions, including instructionshaving immediate data, are sequentially fetched and decoded so that anexecution section carries out the fetched instruction. When theinstruction decoder detects that the instruction code contains immediatedata, the immediate data is transmitted to a data decoder. When theimmediate data is encoded, the data decoder decodes the data accordingto a given rule, thereby generating decoded immediate data The decodedimmediate data is then transmitted to an execution unit to be processed.When the supplied immediate data is not encoded the data decoder sendsthe data intact to the execution unit.

It is a disadvantage of the prior art processing apparatus that duringdecoding of instructions an additional step is required for decoding ofencoded immediate data

DISCLOSURE OF INVENTION

An object of the invention is to provide a processing apparatus thatallows the use of large immediate values as operands, while maintainingan efficient encoding and decoding of instructions.

This object is achieved with a processing apparatus of the kind setforth, characterized in that the processing apparatus further comprisesa dedicated issue slot arranged for loading an immediate value independence upon a dedicated instruction comprising the immediate value.In the processing apparatus according to the invention, large immediatevalues do not have to be encoded in the instructions issued to theplurality of issue slots, but they can be encoded in the dedicatedinstruction issued to the dedicated issue slot. As a result, the widthof the corresponding instructions decreases, which may allow a moreefficient encoding of instructions for that particular issue slot. Sincelarge immediate values are commonly used only for initializing registervalues, a significant increase in efficiency can be obtained.

An embodiment of the processing apparatus according to the invention,wherein the dedicated issue slot comprises a single functional unitarranged for only executing the dedicated instruction. As a result, onlyone type of instruction needs to be encoded for the dedicated issueslot, meaning that no bits for encoding the operation code are required,reducing the width of the dedicated instruction.

An embodiment of the processing apparatus according to the invention ischaracterized in that the processing apparatus further comprises adedicated register file for storing said immediate value, the dedicatedregister file being accessible by the dedicated issue slot. Therefore,the number of bits in the dedicated instruction required for encodingthe destination of result data can be reduced.

An embodiment of the processing apparatus according to the invention,wherein said processing apparatus is a VLIW processor and wherein saidset of instructions is grouped in a VLIW instruction. A VLIW processorallows executing multiple instructions in parallel, increasing theoverall speed of operation, while having relatively simple hardware.

According to the invention a method for processing data is characterizedin that the method further comprises a step of loading an immediatevalue into a dedicated issue slot in dependence upon a dedicatedinstruction comprising the immediate value. The method allows the use ofinstructions requiring large immediate values, while maintaining anefficient encoding and decoding process.

According to the invention an instruction set is characterized in thatthe instruction set further comprises a dedicated instruction having animmediate value, which dedicated instruction when executed by adedicated issue slot causes the dedicated issue slot to load theimmediate value. The instruction set according to the invention allowsthe use of operations using large immediate values, while notcompromising on the efficiency of encoding and decoding of instructions.

According to the invention a compiler program product is characterizedin that the sequence of sets of instructions further comprises adedicated instruction having an immediate value, which dedicatedinstruction when executed by a dedicated issue slot causes the dedicatedissue slot to load the immediate value. The compiler according to theinvention generates efficiently encoded instructions, allowing the useof large immediate values, while maintaining an efficient decodingprocess for the instructions.

Preferred embodiments of the invention are defined in the dependentclaims. A computer program for implementing the method according to theinvention for processing data is defined in claim 10. An informationcarrier comprising a sequence of sets of instructions being arranged forexecution by a processing apparatus according to the invention forprocessing data is defined in claim 12.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of a VLIW processor according to theinvention.

FIG. 2 is a schematic block diagram of issue slot UC₂, comprising threefunctional units.

FIG. 3 is a schematic block diagram of dedicated issue slot UC₄,comprising a single functional unit.

FIG. 4 shows a VLIW instruction and a corresponding compressed VLIWinstruction.

DESCRIPTION OF PREFERRED EMBODIMENTS

Referring to FIG. 1, a schematic block diagram illustrates a VLIWprocessor comprising a plurality of issue slots, including issue slotUC₀, UC₁, UC₂ and UC₃, and a register file, including register filesegments RF₀ and RF₁. The processor has a controller SQ and a connectionnetwork CN for coupling the register file segments RF₀ and RF₁, and theissue slots UC₀, UC₁, UC₂ and UC₃. The register file segments RF₀ andRF₁ are coupled to a bus, not shown in FIG. 1, and via this bus theregister file segments receive input data. Instructions issued to issueslots UC₀, UC₁, UC₂ and UC₃ comprise RISC like operations, requiringonly two operands and producing only one result, as well as customoperations that can consume more than two operands and/or that canproduce more than one result. Some instructions may require small orlarge immediate values as operand data. Connection network CN allowspassing of input data and result data between the register file segmentsRF₀ and RF₁, and the issue slots UC₀, UC₁, UC₂ and UC₃.

The VLIW processor further comprises a dedicated issue slot UC₄ and adedicated register file RF₂. The dedicated issue slot UC₄ is coupled 101to the dedicated register file RF₂. The dedicated register file RF₂ iscoupled to the plurality of issue slots UC₀-UC₃ via the connectionnetwork CN. A dedicated instruction is issued to issue slot UC₄, andthis instruction comprises an immediate value, which is needed duringexecution of an instruction in one of the plurality of issue slotsUC₀-UC₃. The dedicated issue slot UC₄ comprises a single functional unitIMU, which is capable of executing the dedicated instruction. Whenexecuting this instruction, the corresponding immediate value is passedto the dedicated register file RF₂ by the dedicated issue slot UC₄. Thisimmediate value can be read from the dedicated register file RF₂ by theplurality of issue slot UC₀-UC₃, via the connection network CN, andsubsequently used for further processing.

In an advantageous embodiment, the dedicated register file RF₄ comprisesa single register. As a result, no bits in the dedicated instruction arerequired for encoding of the register address. Furthermore, only theissue slot UC₄ is allowed to write data to the dedicated register fileRF₂, so that no bits are required in the dedicated instruction forencoding the selection of the register file to which data have to bewritten. Finally, the issue slot UC₄ comprises only a single functionalunit IMU and only one type of instruction, i.e. the dedicatedinstruction, is issued to issue slot UC₄. As a result only oneinstruction needs to be encoded for issue slot UC₄, meaning that no bitsare required in the dedicated instruction for encoding the operationcode. Since loading an immediate value does not require any operands,operand register indices do not have to be encoded in the dedicatedinstruction either. As a final result, the dedicated instruction forloading of an immediate value can be encoded with a number of bits equalto the width of the immediate value itself.

In some embodiments, the register file segments RF₀ and RF₁ aredistributed register files, i.e. several register files, each for alimited set of issue slots, are used instead of one central registerfile for all issue slots UC₀-UC₃. An advantage of a distributed registerfile is that it requires less read and write ports per register filesegment, resulting in a smaller register file bandwidth. Furthermore, itimproves the scalability of the processor when compared to a centralregister file.

In some embodiments, the connection network CN is a partially connectednetwork, i.e. not each issue slot UC₀-UC₃ is coupled to each registerfile segment RF₀ and RF₁. The use of a partially connected communicationnetwork reduces the code size as well as the power consumption, and alsoallows increasing the performance of the processor. Furthermore, itimproves the scalability of the processor when compared to a fullyconnected connection network.

Referring to FIG. 2, a schematic block diagram shows an embodiment ofissue slot UC₂. Issue slot UC₂ comprises a decoder DEC, a time shapecontroller TSC, and input routing network IRN, an output routing networkORN, and a plurality of functional units FU₂₀, FU₂₁ and FU₂₂. Thedecoder DEC decodes the control word CW applied to the issue slot ineach clock cycle. Results of the decoding step are operand registerindices ORI, which refer to the registers in the register file segmentsRF₀ and RF₁ where the operand data for the operation to be executed arestored. Further results of the decoding step are result file indices RFIand result register indices RRI, which refer to the registers in theregister file segments RF₀ and RF₁ where the result data have to bestored. The decoder DEC passes the indices ORI, RFI and RRI to the timeshape controller TSC, via couplings I. The time shape controller TSCdelays the indices ORI, RFI and RRI by the proper amount, according tothe input/output behavior of the functional unit on which the operationis executed, and passes the indices to the connection network CN, shownin FIG. 1. If the VLIW instruction comprises instructions for issue slotUC₂, the decoder DEC selects one of the functional units FU₂₀, FU₂₁ orFU₂₂, via a coupling SEL, to perform an operation. Furthermore, thedecoder DEC passes information on the type of operation that has to beperformed to that functional unit, using a coupling OPT. The inputrouting network IRN passes the operand data OPD to the functional unitsFU₂₀, FU₂₁ and FU₂₂, via couplings ID. The functional units FU₂₀, FU₂₁and FU₂₂ pass their result data to the output routing network ORN viacoupling OD, and subsequently the output routing network ORN passes theresult data RD to the connection network CN, shown in FIG. 1. Viaconnection network CN the result data RD can be stored in register filesegments RF₀ and RF₁. An instruction valid bit IV is passed to thedecoder DEC and in case of a NOP operation, this bit is equal to zero.The decoder DEC uses this information during selection of a functionalunit and if the instruction valid bit IV is equal to zero, no resultdata RD are written to the output routing network ORN by that functionalunit, to avoid contamination of data stored in the register filesegments RF₀ and RF₁. The instruction valid bit will be furtherdiscussed in FIG. 4. For performing custom operations, functional unitsFU₂₀ and FU₂₁ can handle three operand data, and functional units FU₂₁and FU₂₂ can produce two output data Furthermore, the functional unitsmay use a large immediate value as an operand.

Embodiments of issue slots UC₀, UC₁ and UC₃ are not shown. These issueslots also comprise a set of functional units, capable of executing RISClike instructions or more complex instructions, requiring more than twooperands and/or producing more than one result data. These functionalunits may also require either small or large immediate values as operanddata.

Referring to FIG. 3, an advantageous embodiment of dedicated issue slotUC₄ is shown. Dedicated issue slot UC₄ comprises a single functionalunit IMU, which is only capable of executing the dedicated instruction,and a decoder DEC₄. The dedicated instruction issued to dedicated issueslot UC₄, only comprises a large immediate value, and no bits forencoding of operand data registers, register file index and resultregister indices, and operand code are required. The correspondingcontrol word CW₄ only comprises a large immediate value as well. Thecontrol word CW₄ is applied to the decoder DEC₄, via coupling 301, andthe corresponding large immediate value IMV is directly routed to anoutput of decoder DEC₄, via coupling 303. The large immediate value IMVis further passed to the immediate data input of functional unit IMU,via coupling 305. Inside the functional unit IMU, the large immediatevalue IMV is directly routed to the result data output of the functionalunit, via coupling 307. Finally, the large immediate value IMV is passedto the dedicated register file RF₄, via coupling 309. An instructionvalid bit IV₄ is received by decoder DEC₄ and passed to the functionalunit select input via coupling 311, 313 and 315. The result valid outputof functional unit IMU is directly connected to the functional unitselect input, via coupling 317. In case of a NOP operation, theinstruction valid bit IV₄ is equal to zero. The result valid output bitRV₄ will then be equal to zero too and as a result the large immediatevalue IMV is not passed to the dedicated register file RF₂. Theinstruction valid bit IV₄ will be further discussed in FIG. 4. In fact,the decoder DEC₄ and the functional unit IMU are empty. The largeimmediate value IMV, corresponding to control word CW₄, is directlyrouted to the result data output of functional unit IMU, via couplings301, 303, 305 and 307. The instruction valid bit IV₄ is directly routedto the result valid output of functional unit IMU, via couplings 311,313, 315 and 317. Issue slot UC₄ does not require an input or outputrouting network. It does not require a time shape controller either,since no result file index and result register index are required.

Referring to FIG. 4, examples of VLIW instructions are shown for theVLIW processor shown in FIG. 1. The VLIW instruction 401 comprises fivecontrol words 411-419. Onto control words 413-419 an instruction ismapped for a corresponding issue slot UC₀, UC₁, UC₂ and UC₃,respectively. Onto control word 411 a dedicated instruction is mappedfor dedicated issue slot UC₄. The control words 413, 415 and 419comprise a NOP operation, associated with issue slots UC₀, UC₁ and UC₃respectively. Control word 411 comprises dedicated instruction IMM andcontrol word 417 comprises instruction InstrC, corresponding todedicated issue slot UC₄ and issue slot UC₂, respectively. For anadvantageous embodiment, the instruction format for instruction IMM isshown by instruction 403, which only comprises a large immediate valueIMV1. A possible instruction format for instruction InstrC is shown byinstruction 405, having six fields 421-431, associated with an operationcode OC1, two result register indices D1 and D2 and three operandregister indices S1, S2 and S3, respectively. Different instructionformats maybe used for different instructions mapped onto the samecontrol word 411-419. The instruction format may be varied, e.g.depending on the type of instruction that has to be executed, or on theuse of an immediate value as an operand opposed to retrieving one fromthe register files. For example, in instruction 405 more bits could bespent on encoding more operations or small immediate values, and less onoperand register indices. Control word 411 may comprise a NOP operation,in case no immediate value has to be loaded by dedicated issue slot UC₄.

The uncompressed VLIW instruction 401 can be compressed by encoding theNOP operations using a set of dedicated bits. An example of a compressedinstruction, after compressing VLIW instruction 401 is shown by VLIWinstruction 407, which comprises a field 433 having a set of dedicatedbits, and control words 435 and 437 having instructions IMM and InstrC,respectively. Single bits in the set of dedicated bits encode the NOPoperations mapped onto the control words 413, 415 and 419 of VLIWinstruction 401. A bit ‘0’ refers to a NOP operation and the position ofthe bit in the field 433 denotes to the control word within VLIWinstruction 401 that holds this NOP operation. The ‘0’ bits at positionstwo, three and five within field 433 refer to the NOP operations presentin VLIW instruction 401 in control words 413, 415 and 419, respectively.A bit ‘1’ present in field 433 refers to an instruction having a non-NOPoperation, and the position of the bit in the field 433 points to thecontrol word within VLIW instruction 401 onto which the instruction ismapped. The ‘1’ bit at positions one and four within field 433 refer tothe instructions DAM and InstrC in control words 411 and 417,respectively. In other embodiments, different ways of compressing VLIWinstructions may be applied, as known by the person skilled in the art.

The VLIW processor shown in FIG. 1 is capable of executing instructionsrequiring large immediate values. During compilation of the softwareprogram to run on the VLIW processor, the compiler recognizes constantspresent in the software program. In case these constants can not beencoded as an immediate value in a single instruction, the compiler maycreate a dedicated instruction for loading of this immediate value. Inthe instruction requiring the immediate value, the address of dedicatedregister file RF₂ is encoded, for retrieving the immediate value. Forexample, for loading of large immediate value IMV1 required byinstruction InstrC, the dedicated instruction IMM is encoded. The largeimmediate value IMV1 is then replaced by the address of register fileRF₂ in instruction 405, as operand register index S3. Da this way anefficient encoding of VLIW instructions is obtained, since theinstruction width is allowed to decrease. In an advantageous embodiment,the VLIW instruction is a compressed VLIW instruction 407, in order tofurther reduce the code size. The dedicated bits in field 433 can besimultaneously used as instruction valid bits IV and IV₄, for theircorresponding issue slot. The compressed VLIW instruction 407 isdecompressed by decompression logic present in controller SQ, by addingcontrol words comprising NOP operations to the VLIW instruction 407,using the ‘0’ bits and their position in field 433. When executing theVLIW instruction 401, the large immediate value IMV1 is loaded bydedicated issue slot UC₄ and stored in dedicated register file RF₂. Theissue slot UC₂, when executing instruction InstrC, retrieves this valuefrom the dedicated register file RF₂, via the connection network CN,using operand register index S3. An efficient decoding process isobtained, as the instructions are decoded by decoder DEC, withoutrequiring any additional decoding steps. The hardware of dedicated issueslot UC₄ can be very simple, by means of wiring without requiring anycontrol logic. In case loading of a large immediate value is notrequired in a particular VLIW instruction, a NOP operation is mappedonto control word 411.

A superscalar processor also comprises multiple issue slots that canperform multiple operations in parallel, as in case of a VLIW processor.However, the processor hardware itself determines at runtime whichoperation dependencies exist and decides which operations to execute inparallel based on these dependencies, while ensuring that no resourceconflicts will occur. The principles of the embodiments for a VLIWprocessor, described in this section, also apply for a superscalarprocessor. In general, a VLIW processor may have more issue slots incomparison to a superscalar processor. The hardware of a VLIW processoris less complicated in comparison to a superscalar processor, whichresults in a better scalable architecture. The number of issue slots andthe complexity of each issue slot, among other things, will determinethe amount of benefit that can be reached using the present invention.

It should be noted that the above-mentioned embodiments illustraterather than limit the invention, and that those skilled in the art willbe able to design many alternative embodiments without departing fromthe scope of the appended claims. In the claims, any reference signsplaced between parentheses shall not be construed as limiting the claim.The word “comprising” does not exclude the presence of elements or stepsother than those listed in a claim. The word “a” or “an” preceding anelement does not exclude the presence of a plurality of such elements.In the device claim enumerating several means, several of these meanscan be embodied by one and the same item of hardware. The mere fact thatcertain measures are recited in mutually different dependent claims doesnot indicate that a combination of these measures cannot be used toadvantage.

1. A processing apparatus conceived for processing data, based oncontrol signals generated from a set of instructions being executed inparallel, comprising: a plurality of issue slots, wherein each issueslot comprises a plurality of functional units, the plurality of issueslots being controlled by a set of control words, corresponding to theset of instructions, characterized in that the processing apparatusfurther comprises a dedicated issue slot arranged for loading animmediate value in dependence upon a dedicated instruction comprisingthe immediate value.
 2. An apparatus according to claim 1, wherein thededicated issue slot comprises a single functional unit arranged foronly executing the dedicated instruction.
 3. An apparatus according toclaim 1, further comprising a dedicated register file for storing saidimmediate value, the dedicated register file being accessible by thededicated issue slot.
 4. An apparatus according to claim 1, wherein saidprocessing apparatus is a VLIW processor and wherein said set ofinstructions is grouped in a VLIW instruction.
 5. An apparatus accordingto claim 4, wherein the VLIW instruction is a compressed VLIWinstruction, comprising dedicated bits for encoding NOP operations. 6.An apparatus according to claim 1, further comprising a register fileassociated with the plurality of issue slots.
 7. An apparatus accordingto claim 6, further comprising a connection network for coupling theplurality of issue slots and the register file.
 8. A method forprocessing data, said method comprising the following steps: storinginput data in a register file; processing data retrieved from theregister file based on control signals generated from a set ofinstructions being executed in parallel, using a plurality of issueslots controlled by a set of control words being generated from the setof instructions; and wherein each issue slot comprises a plurality offunctional units, characterized in that the method further comprises astep of loading an immediate value into a dedicated issue slot independence upon a dedicated instruction comprising the immediate value.9. Instruction set, comprising a plurality of instructions for executionby a processing apparatus according to the preamble of claim 1,characterized in that the instruction set further comprises a dedicatedinstruction having an immediate value, which dedicated instruction whenexecuted by a dedicated issue slot causes the dedicated issue slot toload the immediate value.
 10. A computer program comprising computerprogram code means for instructing a computer system to perform thesteps of the method according to claim
 8. 11. A compiler program productfor generating a sequence of sets of instructions being arranged forexecution by a processing apparatus according to the preamble of claim1, characterized in that the sequence of sets of instructions furthercomprises a dedicated instruction having an immediate value, whichdedicated instruction when executed by a dedicated issue slot causes thededicated issue slot to load the immediate value.
 12. An informationcarrier comprising a sequence of sets of instructions being arranged forexecution by a processing apparatus according to the preamble of claim1, characterized in that the sequence of sets of instructions furthercomprises a dedicated instruction having an immediate value, whichdedicated instruction when executed by a dedicated issue slot causes thededicated issue slot to load the immediate value.